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Abstract: A prerequisite for any system that enhances drivers' awareness of road conditions 
and threatening situations is the correct sensing of the road geometry and the vehicle's 
relative pose with respect to the lane despite shadows and occlusions. In this paper we 
propose an approach for lane segmentation and tracking that is robust to varying shadows 
and occlusions. The approach involves color-based clustering, the use of MS AC for outlier 
removal and curvature estimation, and also the tracking of lane boundaries. Lane boundaries 
are modeled as planar curves residing in 3D-space using an inverse perspective mapping, 
instead of the traditional tracking of lanes in the image space, i.e., the segmented lane 
boundary points are 3D points in a coordinate frame fixed to the vehicle that have a depth 
component and belong to a plane tangent to the vehicle's wheels, rather than 2D points in 
the image space without depth information. The measurement noise and disturbances due to 
vehicle vibrations are reduced using an extended Kalman filter that involves a 6-DOF motion 
model for the vehicle, as well as measurements about the road's banking and slope angles. 
Additional contributions of the paper include: (i) the comparison of textural features obtained 
from a bank of Gabor filters and from a GMRF model; and (ii) the experimental validation 
of the quadratic and cubic approximations to the clothoid model for the lane boundaries. The 
results show that the proposed approach performs better than the traditional gradient-based 
approach under different levels of difficulty caused by shadows and occlusions. 

Keywords: road sensing; lane detection and tracking; lane departure warning; mean-shift 
clustering; gabor filters; Gaussian Markov Random Fields; RANSAC 
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1. Introduction 

According to the World Health Organization (WHO), about 1.2 million people get killed in traffic 
accidents each year worldwide, while the number of injured is estimated to be 50 million. On the other 
hand, the cost of road accidents to national economies has been estimated to roughly correspond to 1% 
of their gross national product for low-income countries, 1.5% for middle-income countries and 2% 
for high- income countries, accruing to an estimated global of over US$ 518 billion each year [1]. In 
this context, the development of driver assistance systems (DAS) that address the main risk factors and 
collision causes is essential to further reduce the number of road accidents. 

Different traffic control, road safety and driver assistance systems have been proposed and developed 
during the last decade; see recent surveys in [2-4]. Some DAS focus on the driver, employing biometric 
measures of the driver's performance like alert state and fatigue and awareness levels [5,6]. Some 
others focus on traffic monitoring and control schemes [2,7] to improve road safety. A third group 
aims at increasing situational awareness with different sensing technologies, mainly combining visible 
spectrum cameras with image processing and computer vision techniques [4,8,9]. Among the different 
road and traffic perception systems, lane recognition is a prerequisite for lane departure warning and a 
fundamental enabler for advanced DAS [10,1 1]. However, most of the existing vision-based approaches 
rely mainly on the analysis of the spatial gradient of the road image to extract lane boundaries. The 
main drawback of these methods is that if the road structure is not regular or well-delimited, or if there 
are shadows and occlusions, then it may not be easy to extract the edges correctly and reliably. In this 
paper, we propose a vision-based lane detection and tracking method capable of estimating the lane 
geometry and relative vehicle position even if lane boundaries are not clear due to shadows, changes 
in illumination or partial occlusions caused by other vehicles. The paper also aims at answering some 
practical issues that are not conclusively addressed in the existing literature, which can be summarized 
in the following questions: Is textural information of the pavement useful for segmenting the road in 
a more robust way than classic schemes that rely on edge detection using gradient computations? Is 
color-based clustering reliable enough for segmenting the pavement region despite changes in hue and 
intensity? Can the complex clothoid road models be simplified? Is a quadratic or cubic polynomial 
approximation to the clothoid model good enough? How can the edge geometry be estimated when there 
are broken (discontinuous) edges due to occlusions, shadows, changes in illumination or lack of contrast? 
In prior approaches, textural features were not considered due to their higher computational cost, which 
limited their application on older and less powerful processors. However, very recent work [11] shows 
an increasing interest on texture. More specifically, the approach proposed in [11] relies on a cost 
function to label drivable and non-drivable road regions. The cost function takes into account ground 
plane discontinuities and texture descriptors based on Markov random fields to improve the robustness 
of the drivable region segmentation. 

The proposed approach first segments the area corresponding to the road employing hue-intensity 
clustering and textural features. Using the segmented pavement region, and inverse perspective mapping 
and the MSAC variant of the RANSAC algorithm, the lane geometry and position relative to the vehicle 
in 3D coordinates is obtained. The estimation of the lane geometry and vehicle is further improved with 
an extended Kalman filter (EKF) applied to the features in 3D space taking also into account the motion 
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model of the vehicle. It is to be noted that the lane boundaries are modeled as curves contained in a 
2D plane residing in 3D space. This allows to take advantage of computationally simple homography 
transformations between the planar road model and the imaging sensor plane. However, the dynamic 
model of the vehicle takes into account the slope and bank angle of the road measured with a gyroscope. 
Even if the proposed road model only takes into account the curvature of the road in the plane tangent 
to the vehicle's wheels and does not consider the road's geodesic and torsional curvature in the standard 
Darboux frame formulation, the proposed system is compared with previous methods and shown to be 
robust under a wide range of conditions including quality of lane marks (if they exist), lighting conditions 
and road occlusion by other vehicles. The proposed lane sensing system should help to enhance the 
safety of drivers and pedestrians by preventing unintended lane changes due to distracted driving or 
reducing risky maneuvers due to excessive speed for a given lane curvature. Another contribution of 
this paper is the comparison of the textural features considered, which were generated with two textural 
models: (i) Gabor features; (ii) a Gaussian Markov random field model. Textural features have not been 
exploited enough due to their computational cost and the lack of computational power in the past, but 
are an important aspect for making the road segmentation more robust under low illumination levels. A 
more detailed review of the existing vision-based approaches to lane detection is presented in the next 
section. The remainder of the paper is organized as follows. Section 3 explains the proposed approach. 
The testing methodology and results are presented in Section 4, followed by the conclusions in the last 
section. The paper summarizes several results in the lane detection literature, which complement the 
recent surveys in [10,12], and presents a clear derivation of the Inverse Perspective Mapping and lane 
model equations in Section 3.3. The proposed approach relies on well-known mathematical tools or 
models, some of which have been employed in previous work in the field of lane detection, and thus this 
article should also have a reference and tutorial value. 

2. Existing Approaches 

The main sensing devices that have been proposed for lane detection and tracking include 
radars [13-15], laser range finders [16,17], active infrared sensors that detect lane crossings by measuring 
the IR LED light reflected by the road markings [18,19], and standard visible spectrum perspective 
cameras [10,12]. Although radar and laser based sensors are very good at detecting the road boundaries 
(curbs, berms) as well as other vehicles, regardless of the illumination conditions, their major drawbacks 
are their elevated cost compared with the alternative sensing modalities and the fact that lane markings 
only have color information and no 3D structure that could be obtained by the volume measuring sensors. 
On the other hand, reflective IR sensors only work close to the lane markings and have no real look-ahead 
capability. Because of these disadvantages, cameras are the most frequently employed modality for lane 
and road sensing. Some lane detection and tracking systems employ multiple cameras to perform stereo 
imaging and obtain 3D information or combine different sensing devices [11]. However, since human 
drivers are able to process adequately visual data under a wide range of driving conditions without 
requiring 3D information, here we will focus on the monocular vision-based approach. 

Most vision-based approaches for lane detection extract lane edges and geometry information using 
gradient thresholding [20-22] and improvements relying on steerable filters for detecting solid-line 
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and segmented-line marks [12]. Some other approaches have proposed maximizing the likelihood of 
deformable shape models of the road [23] and similar template matching methods [24]. The use of 
color, gray-scale intensities and textural cues of the pavement has also been suggested in the past, 
however, using color information or pixel intensities alone is not sufficient because this information 
varies significantly with illumination and environmental conditions. For this reason, the first approaches 
introduced simple texture descriptors based on local covariance matrices of the pixel intensities [25,26] 
or neural network classifiers [27]. More recently, Cheng et al. [28] use lane marking color cues for 
structured roads, but apply mean-shift for segmenting unstructured roads. Other recent approaches have 
also considered Gabor features with A;-means clustering [29]. 

In order to cope with road variability and render lane estimation more robust to external disturbances 
(e.g., illumination, visibility, lack of road structure), lane position tracking can incorporate Kalman 
filters and sensor fusion techniques based on the vehicle's state monitoring [15] and GPS measures [30]. 
Some researchers have also considered the importance of using computationally simpler but robust road 
models, for example, Ruyi et al. [31] propose an approach that efficiently combines a modified distance 
transform to obtain the road edges' location together with a particle filter to track the edges' information. 
The reader is referred to [10,12] for surveys on the recent progress in lane detection and tracking. We 
complement these surveys with a brief summary presented in Table 1 . 

3. Proposed Lane Detection and Tracking Approach 

The proposed approach involves the following main steps, which are explained in more detail in 
the following sections. First, the pavement area is segmented using the mean-shift clustering approach 
or employing color and texture information together with a multivariate Gaussian classifier. Standard 
morphological operations are applied to the segmentation in order to remove small isolated regions that 
do not belong to the pavement area and fill small holes in the pavement area produced by misclassified 
pixels, hi the second step, the edges obtained from the segmented pavement are projected back onto the 
road plane in 3D world coordinates using an inverse perspective projection. The parameters of the curves 
that model the road boundaries are obtained using the MS AC variant of the RANSAC algorithm [32]. 
An Extended Kalman filter (EKF) is employed to track the lane position in 3D coordinates. Finally, 
a rule-based approach is used to estimate lane departure situations. The results of the lane detection 
and tracking can be projected from world coordinates to the image plane, thus enhancing the driver's 
understanding of the road scene and providing a visual confirmation of the system's performance. 

3.1. Road Segmentation 

Three different methods were implemented to segment the road: A color clustering approach 
employing the mean-shift algorithm and two approaches using color and texture features modeled as 
the response energies of a bank of Gabor filters or as the parameters of a Gaussian Markov random field 
(GMRF). These three methods are briefly explained next. 
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Table 1. Comparison of a selection of lane detection and tracking approaches published 
in 2004-2012 (see the recent surveys [10,12] for in-depth discussions about foundational and 
influential developments in lane detection). 



System 


Use " 


Road Model 


Feature 
Extraction 


Postprocessing 


Tracking 


Evaluation 
i?i/Frames/ 
fps/Positioning 
Accuracy ^ 


Wijesoma et al. 
(2004) [17] 


C 


Quadratic curve 
on planar road 


2D Ladar measure- 
ments 


Thresholding of 
measured data 


EKE 


£;2/N.A./ 
10/N.A. 


Springrobot 
(2004) [33] 


AB 


Circular arcs on 
planar road 


Edge elements 


Adaptive 
Randomized 
Hough transform 


N.A. 


Si/N.A./ 
N.A./N.A. 


Wang et al. 
(2005) [34] 


B 


Polynomial curve 
on planar road 


Edge elements 


GPS measures 
integration 


KF 


EJN.K. 
/N.A./N.A. 


Kim (2008) 
[35] 


C 


Cubic-spline 
curve on planar 
road 


Lane-marking 
classifiers 


RANSAC curve 
fitting 


PF 


Si/1800/ 
10/N.A. 


Danescu et al. 
(2009) [36] 


B 


3D model consid- 
ering vertical and 
horizontal curva- 
ture 


Image gradient 
from stereo 
cameras 


Probability 

density 

estimation 


PF 


£:i/4200/ 
10/N.A. 


Amditis et al. 
[37] 


B 


Clothoid on pla- 
nar road 


Edge elements 


Hough Transform 
and GPS data 
fusion 


EKE 


£:i/N.A./ 
N.A./N.A. 


Cheng et al. 

(2010,2006) 

[28,38] 


C 


Piecewise linear 


Color analysis 
of lane marks in 
structured roads 
and mean-shift 
clustering in 
unstructured roads 


External vehicle 

elimination 

procedures 


Lane- 

coherence- 

verification 


Si/6000/ 
10/97.4% 


Ruyi et al. 
(2011) [31] 


B 


Locally linear on 
planai- road 


Edge elements 


Edge blobbing 
and distance 
transform 
operations 


PF 


Si/1224/ 
10/N.A. 


Guo et al. 
(2012) [11] 


B 


Planar road with 
segmented 3D 
structure and no 
explicit curvature 
model 


Stereoscopy using 
Markov random 
fields 


Textureless 

regions 

ehmination 


N.A. 


Si/15000/ 
15/N.A. 



A: Autonomous vehicle control, B: Driver assistance, C: Unspecified; Ei: road images, 
E2: controlled scenario, frames: capture length, fps: data capture rate; ^ KF: Kalman Filter, 
EKF: Extended Kalman Filer, PF: Particle Filter, N.A.: Not Available. 



3.1.1. Mean Shift Clustering 

Mean shift is an iterative non-parametric feature-space analysis technique [39], widely employed 
in computer vision and image processing for finding modes in the distribution of certain data of a 
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feature-space, i.e., find the local maxima of the underlying probability density function from the 
available data without actually estimating the density. In other words, mean- shift considers the feature 
space as an empirical probability density function and the data as samples drawn from the underlying 
probability distribution. Considering a finite set of samples 5 C and a weighting function (kernel) 
K : X ^ V- K{x) e IR+, with K{x)dx = 1, K{—x) = K{x). If the kernel is a function with 

compact support on a neighborhood of the origin, often the closed ball A^r(a^) = {w G R" : x|| < r} 
of radius r and centered at x, the weighted mean at x of the samples is then 

^{X) = ^ ^/ _ X 

The repeated movement of data points Si G 5 to sample means m(si) is the so-called mean-shift 
algorithm, i.e., in each iteration s ^ m(s) is performed for all s G 5 simultaneously until the means 
converge [39]. 

Figure 1. Example results from the mean-shift segmentation using bandwidth parameters 

hs = 2,hr = 4. 

Original image Road color clustering 




Our approach to the segmentation of the road pavement area is based on the mean-shift algorithm 
proposed in [40], which considers a feature space composed of spatial domain feature x'^ in the image 
coordinate system and intensity values x'' in the range domain of the L*u*v* color space. The kernel 
considered is a multivariate function defined as the product of two radially symmetric kernels: 



c 



-k 



k 



(1) 



where hg and are respectively the spatial bandwidth and color range parameters, p is the number of 
channels of the color domain, C is a normalization constant and /c : m G IR — )■ |(1 — m^) • 1{m:|m|<i} C IR 



Sensors 2013, 13 



3276 



is the Epanechnikov kernel. The final segmentation is obtained by grouping all pixels that are closer 
than hs in the spatial domain and closer than hr in the range domain, as shown in the example results of 
Figure 1. 

3.1.2. Gabor and GMRF Texture Features 

The second approach implemented for road segmentation is based on the classification of texture 
and color features using a standard multivariate Gaussian classifier (MVGC) [41]. The texture features 
associated to each pixel are generated either as the response to a bank of Gabor filters, or as the 
parameters of a GMRF [42]. To improve the identification of pavement and other elements, RGB color 
components of the Gaussian filtered image are also included into the feature vectors. 

The impulse response of the Gabor filter in the spatial domain coordinates can be expressed as 



h{p\a,\e,(t),po) = exp 



lb -Pol 
2a2 



sm 




Pol 



COS 6 



sm I 



0 



where p = [x y]'^ is the spatial coordinate, A corresponds to the wavelength of the sinusoid, a is the 
spread of the Gaussian envelope, 6 and 0 are respectively the orientation and phase of the Gabor filter, and 
Po is the center of the neighborhood N(po) on which the truncated response is defined (the region of the 
filter mask). In order to discriminate different textures correctly regardless of the textures phase, the total 
response energy of a pair of filters in quadrature phase relationship with 0 = {0, 7r/2} is often employed. 
Quadrature filters are motivated by the fact that odd Gabor masks resemble edge detectors, while even 
masks act as blob detectors. The total response energy is obtained from the convolution of h(-) with the 

image as E'^ {a, \,9\p) = E,^={o,V2} {Speiv{po) ^(^1^' ^' ^' <^'Po)^(p)} ' where / : C ^ Z+, 
p I— )■ I kip) for p E ^l, is the intensity function representing the image. Using the response energies 
at different wavelengths A and orientations 6, it is possible to construct the texture descriptor at pixel 
p as the feature vector Xg{p) = [^^(Ai, 6*115) • ■ ■ E'^{\„ej\s) ■ ■ ■ E'^{\n,9m\s) Sr{p) Sg{p) Sb{p)f E 
where Sr,Sg, Sh are the responses of the red, green and blue channels to a Gaussian filter, and the 
dependency on a has been suppressed by making a a constant or dependent on A [42]. Figure 2 shows 
an example of road pavement segmentation obtained using the Gabor textural features, and the posterior 
morphological opening and closing operations, whose purpose is to remove the small isolated areas and 
fill small holes in the region classified as pavement. 

An alternative description of textural properties is possible in terms of the parameters of Gaussian 
Markov random fields (GRMF) [42], which describe the statistical dependency between the intensity 
I(p) at pixel p and that of its neighbors with the equation I(p) = J^reAN {I{p + r) + I(p — r)) + 
e{p), where AA^ = {r : p ± r E N(p)} for some neighborhood N{p) of p and e{p) is zero-mean 
Gaussian noise. Here it is assumed that the observations I{p) are zero-mean Gaussian, and that the 
dependency of the intensity at the central pixel p is symmetric with respect to the intensities of pixels in 
the neighborhood N{p), i.e., 9^ = 9_r, Vr G N{p). 



A consistent and computationally efficient estimate of the parameters O 

Tj E AN can be obtained applying the least squares method as 

-1 



[9ri 9. 



r2 



9r 



q{p)q{pf 



^ q{j>)I{j>) 
.peni 
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where Vtj is an interior region of the image and 



q{p) 



def 



I{p + ri) + I{p - ri) 
I{p + ^2) + I{p - 

I{P + Tn) + I{p - Tn) 



The estimate of the noise variance can be calculated as = Ylipe^i (Hp) ~ Q*^q{p)Y- 



Figure 2. Example of road segmentation using Gabor textures, (a) Original image; 
(b) Gabor-based texture segmentation; (c) Morphological opening and closing. 




(a) 



(b) 



(c) 



Previous work by the authors [42] has shown that textures can also be adequately characterized by 
feature vectors defined in terms of the estimated GMRF parameters as Xm = [©* v*/ p"^]^, 
where p is the sample variance of the texture as a feature vector [0* v*]^ . 



3.2. Lane Boundaries Extraction 



Once the road pavement areas have been segmented, the lane boundaries are obtained computing 
the spatial gradient directly on the segmented image. This is key to making the approach more robust 
to shadows and occlusions than other methods that extract lane boundaries by computing the spatial 
gradients directly on the pixel intensities of the original image. To compute spatial gradients, here we 
employ steerable filters because they are computationally efficient and, as shown in [12], can reliably 
detect continuous and segmented line markings under different road conditions. On the other hand, 
steerable filters directly applied to the original image as proposed in [12] provide a benchmark method 
against which we can compare ours. 

Steerable filters correspond to oriented LoG filters calculated using three precomputed second-order 
derivative masks, one in the horizontal direction, Guu, one in the vertical direction, G^y, and one in 
the diagonal direction. Guv, allowing an efficient calculation of the oriented filter Ge in any direction 
9 according to G'e('u, v) = GuuCos'^{9) + GyySm^{9) + Guv cos{9) sin(6'); (see [43] for further details). 
In order to maximize the response of the steerable filters to the lane marks and road boundaries, a bank 
composed by two filters is applied to the road area extracted in the previous step. One filter is oriented 
along the normal of the expected left boundary of the lane, while the other along the normal of the 
expected right boundary. The thresholded responses to both filters are then combined into one unique 
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response applying the logical "or" operator. The orientation of both filters is updated in each iteration 
according to the orientation of the lane's boundaries that were obtained in the previous iteration. 

To further improve the extraction of boundaries and lane markings, undesired steerable filter responses 
produced by potholes, cracks and other spots in the image are removed if the segments do not satisfy: (i) 
a minimum length criterion; or (ii) a geometric criterion which measures the similarity of the segment to 
a curve. The geometric criterion is based on the fact that a straight line segment represents the degenerate 
case of an ellipse whose eccentricity e = //a (the ratio between its focal length / and its major 
semi-axis a) is one. Hence, it is possible to discard all segments of connected pixels for which their 
approximating ellipse does not satisfy e > 5i, for some 5 G (0, 1), and their length is not greater than 
a minimum length 82- The parameters 5i and 82 can be estimated from the standard road geometry, 
average lane marking lengths and perspective considerations. Fitting ellipses to segments of connected 
pixels can be quickly done by computing the second moment of the group of pixels. This is a rather 
common approach to find the major/minor axes of the ellipsoid enclosing a cloud of points, considering 
that level-sets of a multivariate Gaussian distribution are ellipsoids, and that the ellipsoid axes are 
oriented along the eigenvectors of the covariance matrix, while the magnitude of the axes is given by 
the corresponding eigenvalues [44]. 

Figure 3. Application of steerable filters, and the edge length and eccentricity selection 
criteria to a road image of the Caltech Lanes dataset [45]. (a) Segmented road area using 
Gabor textures; (b) Response to steerable filter with d = after thresholding; (c) Response 
to steerable filter with 9 = +^ after thresholding; (d) Logical "or" operation of (b) and (c), 
and posterior edge selection based on the segment length and eccentricity criteria. 







/ - 


(a) 


(b) 


\ 

1 . 




j \ 


(c) 


(d) 



An example of the results obtained when applying the steerable filters to the segmented road pavement 
of Figure 3(a) is shown in Figure 3(b and c), with filter orientations 6 = —n/6 and 9 = +7r/6, 
respectively. Once the logical "or" of the thresholded responses to the two oriented filters is computed, 
the edge selection criteria are applied to obtain the final edges shown in Figure 3(d), from which the lane 
geometry will be estimated. 

The line segments extracted using the steerable filters and the geometric selection criteria must be 
labeled as left-boundary, right-boundary or center lane-markings. To this end, the Hough transform [46], 
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"H : Isij)) ~^ H{p, 0), of the segments image hip) is computed and the centroids of the three largest 
local maxima in the Hough matrix H{p, 0) are calculated, thus yielding the (p, 6) parameters of the 
major lines in the road. Segments of connected pixels are selected or discarded depending on how close 
their (p, 6) parameters are to the major lines. This process allows to further reduce the amount of local 
lines that do not correspond to actual lane boundaries. On the other hand, the principal (p, 6) parameters 
provide initial estimates for the lane position and geometry computation as will be explained in the next 
section. Furthermore, tracking and predicting the lane position in the next frames can be done efficiently 
in the (p, 9) parameter space using the Kalman filter instead of tracking the set of all salient points 
in the image. 

3.3. Lane Geometry and Position Computation 

Computing the lane geometry and position requires to fit some mathematical model of the road to the 
boundary segments extracted in the previous steps. To this end, it is convenient to project the segments 
in the image back from the camera's optical plane to the plane of the road. This process is often referred 
to as the Inverse Perspective Mapping (IPM) and allows to generate a top-view of the road [47]. 

We construct the IPM using the standard pinhole camera model under the following assumptions: 

(i) the world coordinate system Sw {x^, y^, z^} is fixed to the vehicle (see Figure 4), 

(ii) the camera is mounted on the vehicle at some given constant height h with respect to the ground, 
and tilted about its focal point in such a way that its optical axis forms an angle 6q with respect to 
an axis parallel to passing through the focal point. 

(iii) the road ahead the vehicle is locally flat in the first ~100 m, i.e., has negligible curvature 
with respect to the ground tangent plane at the vehicle's current position. This means the lane 
boundaries are modeled as curves contained in a 2D plane which is tangent to the wheels of the 
vehicle. However, it is to be noted that the plane has a slope angle Og and bank angle 6^. While 
these angles are not relevant to the IPM calculations, they are considered in the dynamic model of 
the vehicle employed in the implementation of the extended Kalman filter explained in Section 3.4. 

For clarity of exposition, the plane containing point p shown in Figure 4 is an enlarged version of the 
camera's optical plane array, whose size is assumed to be m x n pixels large with m and n odd. The point 
p can be expressed as a coordinate pair p = (m, v) in the camera's reference system Sc = {u, v}, where 
u is the camera's horizontal axis, while v is the camera's vertical axis. The origin of Sc is centered on 
the optical plane at the intersection with the optical axis. The position of p can also be expressed in 
terms of the standard sensor's array or image row-column coordinate pair (r, c) G and the pixel size 
11 (m/pixels). The expressions relating the coordinates (m, v) and (r, c) are thus given by: 



v{r) 



m + 1 



— r ^ r{v) = 



m + 1 



-1 



(2) 



2 



2 

n + 1 



— Vfl 



n + l 



=^ c{u) = 



-1 



(3) 



fi c — 



2 



2 



+ up 
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Figure 4. Inverse perspective projection model: point p on the optical plane is projected 
back onto point P on the road. 




As shown in Figure 4, the point p = (u, v) with associated coordinates (r, c) can be projected back 
onto a point P on the road with coordinates (x, y, 0) in Sw, i-^-, projected onto the plane x^y^, using 
Equations (2) and (3) and projective geometry. The coordinates x and yofP are given by the following 
inverse perspective mapping (IPM) equations: 



X r 



y {r, c) 



h 



h 



1+ [l-2(^)]tanK)tan(go)' 
tan(eo)- [l-2(^)]tan(a,) 

[l-2(^)]tanK) 
sin(^o) - [1 - 2 (^)] tan(a,) 005(^0) 



(4) 



(5) 



where and a„ correspond to half of the horizontal and vertical angular field of view of the camera, and 
60 is the camera tilt. If the camera lens has a focal distance /, then tan(Q:„) = u{n) // = (n — (2/) 
and tan(«,) = v{l)/f = (m - 1)^/(2/). 

Similarly, any point P = (x, y, 0) on the road (with coordinates expressed in Sw) can be projected 
onto a point p on the optical plane array with associated coordinates (r, c) given by: 



r (x) 
c{x,y) 



m 



2 

n — 1 



1 + 



h — xtan(0o 



(h tan(^o) + ^) tan(a„) 

y 



+ 1 



+ 1 



(6) 
(7) 



{h sin(6'o) + x cos(6'o)) tan(a„ 

Equations (6) and (7) are useful for projecting on the image the computed boundaries and providing the 
driver with a visual confirmation of the operation of the system. 
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Due to the nonlinear form of Equations (4) and (5), evenly spaced increments in r and c do not 
produce evenly spaced points (x, 0) on the road plane as illustrated in Figure 5(a), which shows the 
top view of the left boundary and lane markings of the road lane in Figure 3(d). In other words, pixels 
covering the horizon represent larger distances of the road than pixels covering the pavement close to 
the car. In order to correct this situation that complicates fitting the model, evenly spaced interpolation 
is performed to ensure a uniform distribution of points per unit of length in the IPM image of the lane 
boundaries. The equispaced interpolation is shown in Figure 5(b). 

Figure 5. IPM Equations (4) and (5) applied to lane boundaries of Figure 3(d). 
(a) IPM of lane boundaries without interpolation; (b) IPM of lane boundaries after 
equispaced interpolation. 
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The lane boundary points (x, y) on the ground plane x'^y^ must be fitted to a mathematical model 
of the road in order to determine the lane position relative to the world reference system Sw The 
standard mathematical model assumes the curves are designed to produce a gradual transition from a 
straight section with infinite curvature radius to another straight section oriented in a different direction 
by linearly increasing and decreasing the curvature, thus preventing sudden changes in centrifugal forces. 
Mathematically, the linear increase/decrease in curvature is represented by segments of the Euler spiral, 
often called spiral transition curve or clothoid [48,49]. 

The normalized clothoid curve {\/2RqSq = 1) can be approximated by a parametric representation 
in polynomial form after replacing the terms cos(,s^) and sin(s^) by their power series expansion and 
integrating, thus yielding 

y(L) = VST^f sm(.=)d.= + + (9) 

where Rq corresponds to the radius of the circular curve at the end of the spiral and L the length of 
the spiral curve. The third order power series truncation is a practical approximation in road designs, 
which is amenable for simpler and faster computation of the road geometry. Hence, we approximate the 
clothoid model Equations (8) and (9) by a cubic polynomial: 

y{x) = a^x^ + a2x'^ + aix + ao (10) 
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The parameters a^, i = 0, 1, 2, 3, for the lane boundary model Equation (10) can be robustly estimated 
despite occlusions and shadows using the MS AC (M-estimator Sample And Consensus) algorithm [50], 
which is a variation of the standard RANSAC algorithm [32] that weights the inliers according to how 
well they fit to a model, while the outliers are given a constant weight, thus improving the robustness of 
the estimation with no additional computational cost [51]. The application of the MS AC algorithm 
is presented in Figure 6, which shows the fitted polynomials (green continuous curves) despite the 
notoriously large amount of outliers caused by tree shadows. It is possible to notice in Figure 6(b) 
that the cubic polynomial boundaries are correctly located on the right lane of the segmented street, even 
if the connected pixels extracted for the lane's right boundary are very few (red segments in Figure 6(a)). 

Figure 6. MSAC cubic polynomial fitting (green) in a street with many tree shadows, 
(a) Curves fitted to the extracted lane boundaries segments (top-view); (b) Lane boundaries 
superimposed on the segmented street pavement (top-view). 




Sometimes the amount of occlusions and shadows can be even more severe than that of Figure 6 and 
the MSAC algorithm yields incorrect parameters for the polynomial. An example of an extreme situation 
is shown in Figure 7(a), in which the right boundary has to be estimated from a very short segment of 
connected pixels. This short segment leads MSAC to produce a right-boundary that is inconsistent with 
that obtained with the short edge segments of the left-boundary. To verify the reliability of the curves 
fitted with the MSAC algorithm, we implemented an additional step which checks the distance interval 
covered by the points in the consensus set along the longitudinal axis x^, as well as the number of points 
in the consensus sets. It was experimentally found that reliable boundaries could be identified whenever 
the points in the consensus set covered a distance at least between 5 m and 15 m, and the number of 
points was larger than 20% of total number of points that would be available if the edge would have been 
fully segmented. If these conditions are not fulfilled for only one of the lane boundaries and the mostly 
occluded boundary has a non-empty consensus set, then the correctly estimated curve for the mostly 
visible boundary with parameters a\, a^, Ci\: 

UAix) = a^x^ + a2x'^ + a*x + A (11) 

is vertically shifted by an amount A, which is found using the MSAC procedure. The result of this 
correction procedure is illustrated in Figure 7(b), which shows the fitting of the visible left-boundary 
(top line) to the mostly occluded right-boundary points (bottom line). 
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Figure 7. Example of the correction procedure for erroneous MSAC fitting results. 

(a) Erroneous MSAC fitting due to severe occlusion of the lane's right-boundary (red); 

(b) Correction of the lane's estimated right-boundary by vertically shifting the lane's 
estimated left-boundary (blue) and fitting it to the points of the right-boundary (red). 
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Finally, if any of the consensus sets is empty due to total occlusion of any of the lane's boundaries, 
an estimation of the lane's position is obtained using the value predicted by an extended Kalman filter 
(EKE) implemented to track the 3D coordinates of a selection of points belonging to each edge. The 
details of the EKE implementation can be found in previous work by the authors [52]. The difference in 
this work with respect to the preliminary research [52] is that the previous filter considers only a simple 
linear model for the lane boundaries, tracks only the (p, 6) parameters of the lines and does not involve 
any robust segmentation and parameter estimation procedure as proposed here. 

3.4. Extended Kalman Filter for Disturbance Mitigation in Lane Estimation and Tracking 

In order to reduce the effect of disturbances in the estimation of the lane geometry, we implemented 
an extended Kalman filter (EKE) [53] due to its good trade-off among the computational efficiency, 
accuracy and robustness that it provides [54]. Fundamental to the implementation of the EKE are the 
vehicle's dynamics model characterizing the evolution of its state and the observation's model, which 
are briefly described next in the context of the proposed approach. Eor in-depth treatments of vehicle 
dynamics modeling and state estimation methods, the reader is referred to [53,55,56]. 

The state of the vehicle is defined in terms of the linear displacements t^., ty, along the 
vehicle-fixed frame and angular displacements 9^, 9y and 9z about the vehicle's reference frame, and 
their corresponding rates of change, v^, Vy, v^, and u^;, Uy, u^, for the linear and angular velocity, 
respectively. The inputs to the model can be divided into manipulated and disturbance variables. The 
manipulated variables are the thrust force T/, which is adjusted by the throttle controlling the amount 
of fuel going into the engine, and the steering angle 5 adjusted by the steering wheel. The steering 
angle 5 is the angle between the wheels' rotation plane and the longitudinal axis of the vehicle. 
The thrust force T/ is assumed to be contained in the plane of rotation of the steered wheels and 
tangent to the ground plane in the contact pad of the wheels, i.e., we assume a front-wheel-drive model 
in which Tf acts along the direction 5 of the steering angle. The disturbance variables in turn can 
be divided into those representing the vertical geometry of the road described in terms of the slope 
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angle Og and the bank angle 6^, and those representing random disturbances in acceleration due to 
multiple phenomena, such as wheel slippage and skidding, potholes, speed bumps and humps, raised 
crosswalks, other vehicle-tire-ground interactions, and aerodynamic drag forces, among others. The 
slope angle Og is the inclination angle of the x^-axis with respect to the horizontal inertial plane, i.e., 
6s is the so-called elevation or pitch angle 6 in Tait-Bryan convention for Euler angles. The bank 
angle 6'f, is the lateral tilt about the x^, i.e., 9^ is the so-called bank or roll angle in the Tait-Bryan 
convention for Euler angles. The bank and slope angles change slowly and can be assumed to be 
piece-wise constant. Moreover, these two variables can be measured with a gyroscope, and therefore, 
can be input to the model as feedforward (measured) disturbance variables. The linear and angular 
acceleration disturbances are modeled as zero-mean i.i.d. Gaussian noises Ux, Uy, and d^, dy, d^, 
respectively. The linear and angular acceleration noises allow to take into account the uncertainty about 
changes in bank and slope angles of the road, as well as other aspects of the unmodelled dynamics 
whose effect for practical purposes can be adequately handled by a simplified stochastic model, either 
because the effect is negligible in comparison with the main model dynamics, or because in practice 
the cost of accurate parametrization simply outweighs the gains in the accuracy and precision of 
the state estimation. 

Using the above state, input and disturbance variables, the vehicle dynamics is described by the 
following state-space model formulated in the vehicle-fixed coordinate frame Sw (see [55,56] for 
detailed standard model derivations): 

Vx 

Vy 

<^x 



tx 




ty 




iz 




Ox 








Qz 




Vx 




Vy 




Vz 








Uy 




^Z 





y 

^Tf cos(5) - UyV^ + u^Vy - ^v^ - g sin 
—ujzVx + ojxVz — g cos{9 
-u^Vy + UyV^ - g cos{9, 

+ + dx 

'-UJxi^Z +^'9s + dy 

-ujxOJ,, + j-Tf sin (5)// + dz 



I sm(^ 
cos( 



h) + riy 



L 



where m is the total mass of the vehicle, g is the acceleration of gravity, and I^, ly 



(12) 



J, are the inertia 



moments about the principal axes of the vehicle (the products of inertia I^y, Ixz and lyz are neglected 
because due to the vehicle's symmetry and choice of axes, they are much smaller than the principal 
moments I^, ly, h)- The parameter is the drag coefficient along the longitudinal axis x^. Here the 
model considers that the lateral tire forces of the front and rear wheels, as well as the normal forces of 
the front and rear wheels, cancel each other when calculating the moment balance about the vehicle axes. 
This loss of information is taken into account by the angular acceleration disturbance terms d^, dy and d^. 
Defining a state vector ^ = (tx, ty, t^, 9^, 9y9z, Vx, Vy, v^, Ux, ujy, io^Y, inputs vector v = {Tf, S, 9b, 9s), 
and disturbance vector u = (n^, Uy, n^, d^, dy, d^), it is possible to write the continuous-time state-space 
model in compact form as ^ = f i;, z/). 
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The measurements vector is defined by the vector containing the collection of points 



\T. 



Pi 

P2 

Pn 



(13) 



segmented from the lane boundaries and expressed in the vehicle-fixed reference frame. The dependence 
of the points Pj on the vehicle's state vector ^ results from the fact that the camera frame is rigidly 
fixed to the vehicle and related to the vehicle frame by a constant coordinate translation and rotation 
transformation. Therefore, the positions of points Pj of the lane boundaries (relative to the vehicle 
frame) change as a result of the vehicle motion according to the well-known equation: 













^X 






^ i — 


m 




Vy 




Uy 


X 


Vi 




Zi 




Vz 








Zi 



1,2, 



(14) 



The measurement model is simply given by C = g('C)f) = /^(O + where e G IR is a vector of 
Gaussian random noises accounting for the errors in the segmentation process. 

By Euler discretization it is possible to find a recursive approximation of the predicted measurements 
C,{k + 1|A;) at instant + 1 in terms of the past measurements and the vehicle's state vector at time k. 
Thus the discretized measurement prediction model can be formulated as: 



ak+m 



Pi(fc) - TsV{k) - T,n{k) X Pi(fc) 
P2{k) - T,V{k) - T,n{k) X P2(A;) 

P^(fc) - TsV{k) - TM{k) X P^(fc) 



(15) 



where Ts is the sampling time, V{k) = [vx{k),Vy{k),Vz{k)]'^ and ^l(k) = [ujx{k),ujy{k),uJz{k)] are the 
vectors of translational and angular velocities of the vehicle at time instant k. 

The discrete measurement prediction model Equation (15) together with the discretization of the state 
model Equation (12) can be employed to formulate an extended Kalman filter that allows to reduce 
the disturbances on the lane geometry estimation produced by the vehicle's motion due to driving 
maneuvers and vibrations arising mainly from the vehicle-ground interaction. More detailed models 
of the vehicle's dynamics that consider different aspects of wheel-to-ground contact are possible, see for 
example [55,56]. Although these models can be used to extend the model employed here, in practice we 
noticed that the model Equation (12) can be further simplified without severely affecting the accuracy 
of the results obtained. The main simplifications rely on the fact that the EKE does not lose its capacity 
to filter high frequency disturbances due to potholes, speed humps, speed bumps, raised crosswalks or 
small irregularities on the pavement, when the dominating vertical geometry of the road changes at slow 
rates in comparison with the rate of change in longitudinal and directional variables, i.e., ujx,cOy ^ oJz 
and Vy,Vz ^ Vx- It is to be noted that oversimplified models for the EKE such as the kinematic model 
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proposed by the authors in [52], while helpful in filtering the measurement noise, in practice are less 
robust to process disturbances, especially when the vehicle moves at higher speeds. 

3.5. Lane Departure Warning 

The lateral distance of the car to the lane boundaries, given by the parameter of the lane boundary 
model Equation (10), can be used to compute a measure useful for alerting drivers of an imminent and 
possibly dangerous unintended line crossing or departure. This measure also takes into account the 
lateral velocity of the vehicle, which is used to calculate the so-called time-to-lane crossing (TLC) [57]. 
At a given discrete-time sampling instant k, the TLC is computed as: 



where dk is the smallest of the lateral distances to the lane's right or left boundaries and Vk is the lateral 
velocity. For practical purposes, Vk can be adequately approximated by the following moving average of 
a first-order discrete-time approximation of the lateral distance time-derivative: 



where represents the time at sampling instant i and is the number of sampling instants considered 
for averaging. 

4. Testing Methodology, Experimental Results and Computational Cost 

4.1. Testing Methodology and Experiment Setup 

The performance of the proposed lane sensing method and departure warning system is evaluated 
using image sequences acquired during real driving conditions on highways and urban roads in the 
city of Santiago, Chile. The datasets are grouped in four different scenarios that are summarized 
in Table 2 (dataset available on-line at: http://ral.ing.puc.cl/datasets/ldw). These scenarios consider 
different lighting conditions, such as daytime, sunset/dusk and various road conditions with and without 
lane marks, shadows, and occlusions due to other vehicles, as well as varying road geometry, including 
straight and curved streets. Curved streets are streets with non-zero curvature in the road plane tangent 
to the vehicle's wheels. The scenarios considered here include roads with relatively constant slope and 
bank angles, i.e., their geodesic and torsional curvature (change in slope and bank angles) is negligible 
in the proximity of the vehicle up to a 100 m. This choice of scenario was motivated by the interest 
in testing the ability of the proposed approach to measure the road curvature in the plane tangent to 
the wheels, which is key for lane keeping and departure warning more than the torsional and banking 
curvatures. Examples showing the projection of the estimated lane boundaries onto the image plane 
using Equations (6) and (7) are presented in Figure 8 for the four scenarios. 



TLCk 



(16) 




(17) 
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Table 2. Scenarios for the performance evaluation of the lane sensing and departure 
warning system. 



Scenario 


Road Geometry Lane Marks Quality 


Lighting Condition 


Frames 


Lane boundaries detection and tracking system 


1 


Straight Fair 


Daytime, no shadows 


340 


2 


Straight and curved Fair 


Daytime, no shadows 


350 


3 


Straight Poor 


Sunset, shadows 


410 


4 


Straight and curved Poor 


Dusk, shadows 


380 


Lane departure warning system 


1 


Straight Fair 


Daytime, no shadows 


340 


2 


Straight and curved Fair 


Sunset, shadows 


280 


3 


Straight and curved Poor 


Dusk, shadows 


300 



Figure 8. Examples of lane detection and tracking results for the considered scenarios, 
(a) Scenario 1; (b) Scenario 2; (c) Scenario 3; (d) Scenario 4. 
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Figure 9. Vehicle for data acquisition. 



Camera 




The video datasets were acquired using a Point Gray® Firewire camera with a 640 x 480 pixels 
1/2"CCD and a Tamron varifocal lens with focal distances in the range of 6-12 mm, corresponding to a 
field view in the range 30.4 ° x 23.1 ° (telephoto) — 58.7° x44.4 ° (wide). The camera was rigidly attached 
to a structure supporting our sensors suite (GPS, thermal, omnidirectional and standard perspective 
cameras) mounted on the roof of our car test bed, as shown in Figure 9. The camera was located at 1.8 m 
from the ground with its optical axis pointing to the horizon. With this setup the look ahead distance 
is about 40 m. The resolution in real world coordinates depends on the image coordinates as can be 
observed from Equations (4) and (5). The real spatial resolution for an image point with coordinates 
(r, c) is given by = x(r, c) — x(r — 1, c) and Ay = y{r,c) —y{r,c—l). Considering the closest point 
to the car (first row r = 1, central column c = 320), and using Equations (4) and (5), the real spatial 
resolution with the 640x320 pixels camera and / = 9 mm is (A^,, A^) = (59, 12) mm. This resolution 
corresponds to that of the pixel looking at a point in front of the car at 14 m straight ahead. Similar 
computations for a point 30 m ahead yield a spatial resolution of (A^^, Aj^) = (583, 54) mm, i.e., the 
pixel covers approximately 60 cm (depth) by 5 cm (width) of the real world. If a pixel in the horizon is 
considered, i.e., a pixel with a„ = a^, = 0, then the theoretical spatial resolution of the pixel is infinite, 
as clearly shown by the denominators of Equations (4) and (5), which become zero, since also = 0 
for the horizontal camera. As shown by the segmentation results presented in Figures 1, 2 and 6(b), 
the road segmentation is not perceivably affected by changes in spatial resolution. In the gradient and 
mean-shift approaches this is explained by the fact that the former only looks at the change in intensities, 
while the latter performs a sort of clustering by color similarity and spatial proximity. In the case of the 
approaches using Gabor and GMRF textural features, the relative insensitivity to the spatial resolution of 
each pixel is achieved by training the classifier with training regions selected from different areas of the 
road pavement close and far from the vehicle. This ensures that in the operating range of the sensor, the 
segmentation is performed as correctly as possible, even if in the general case texture is a scale dependent 
cue. Hence, in general the road segmentation approaches are not significantly affected by the resolution 
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of the imaging sensor. However, it is to be noted that the resolution of the imaging sensor does impose a 
limit on the accuracy with which the position of the car can be resolved with respect to the edges of the 
road. If each pixel represents about 5 cm of width in the real world, then in the ideal disturbance-free case 
with subpixel resolution it would be possible to expect lateral positioning accuracies better than 5 cm. 
However, interpolation errors and noise make the achievable positioning accuracy worse than the ideal 
5 cm. Therefore, regardless of the road segmentation strategy, if higher lateral positioning accuracies 
are sought, the imaging sensor and the associated optics will have to be replaced accordingly to achieve 
accuracies that can be estimated using Equations (4) and (5). 

4.2. Experimental Results: Tracking Accuracy, Misdetections and False Alarms 

The accuracy and precision of the lane detection and tracking algorithm is measured in terms of the 
mean absolute error (MAE), the root mean square error (RMSE) and the standard deviation of the error 
{as) between the estimated lane geometry and the manually identified curves for the lane boundaries in 
the analyzed datasets. These results are presented in Table 3. The computation of the MAE, RMSE and 
standard deviation consider the estimation errors for the left and right lane boundaries simultaneously. 
The MAE and RMSE errors are computed pointwise, i.e., if the ground truth curve is represented by a 
set of points (xj, y{xi)), i = 1,2, . . . , N, sampled from the real lane boundary, and the estimated curve 
is given by (xj, y{xi)), i = 1,2, . . . , N, then the estimation error at point Xi is = y{xi) — y{xi). On 
the other hand, the reliability of the lane departure warning is measured in terms of false positive (FP) 
and false negative (FN) rates, as shown in Table 4. 

The classic gradient-based approach with steerable filters proposed in [12] is used as a benchmark for 
evaluating the accuracy and reliability of the proposed approach under the different road segmentation 
strategies, i.e., using Gabor features, GMRF features and mean-shift clustering. The Gabor-based 
segmentation employed a bank of eight filters considering wavelengths A = {8, 4} pixels and 
orientations 9 = {0°,45°, 90°, 135°}; (for a discussion on the selection of adequate parameters for the 
Gabor filters, see [58]). The GMRF-based segmentation employed the star-like neighborhood N(p) as 
recommended in [42]. The mean-shift clustering was performed using a spatial bandwidth hg = 2, and 
range parameter hr = 4, which on average yielded the best segmentation of the road's pavement and 
shadows under various lighting conditions. The experiments also compare the use of a quadratic lane 
boundary model in addition to the cubic model Equation (10), and evaluate the benefits of using the EKE, 
whose parameters can be found in previous work by the authors [52] as mentioned before. 

The last three rows of Table 3 summarize the accuracy and precision results for the different lane 
detection and tracking schemes. The results show a 14.6%-22.6% reduction in RMSE and MAE on 
average when Gabor filters are employed without the EKE. An additional 7.1%-8.8% reduction on 
average is possible thanks to the EKE. The GMRF-based segmentation also improves the lane detection, 
reducing the error by 13.2%-23.9% on average for RMSE and MAE metrics compared with the gradient 
method. An additional 6.0%-6.9% improvement is achieved using the EFK with the GRME texture 
features. The segmentation using the mean-shift algorithm yields the best results with a reduction of 
38.0%-38.9% on average in the same metrics with an additional 5.5%-8.1% of reduction in the error 
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when EKF is used. The variance of the error also decreases with any of the three segmentation methods. 
Once again the mean-shift clustering yields the results with lowest variance. 

Table 3. Lane detection and tracking results. 







Method 






G 




Gabor Filter Segmentation 


GMRF Segmentation 


Mean-Shift Clustering 










G 


G + EKF 


G 


G + EKF 


G 


G + EKF 






Q 


C 


Q 


C 


Q 


C 


Q 


C 


Q 


C 


Q 


C 


Q 


C 




Ml 


12.12 


13.21 


10.11 


9.81 


9.21 


9.31 


11.13 


12.04 


11.01 


10.98 


9.11 


9.51 


8.51 


8.62 


Si 


M2 


13.33 


14.81 


11.15 


10.15 


9.51 


10.11 


13.01 


13.05 


12.05 


12.33 


9.47 


10.03 


9.02 


9.11 




Ma 


3.21 


3.92 


2.91 


3.01 


2.51 


3.13 


4.01 


3.98 


3.68 


3.56 


2.19 


2.52 


2.21 


2.41 




Ml 


13.62 


14.11 


9.81 


9.71 


9.11 


9.12 


11.51 


10.65 


11.21 


10.45 


9.31 


9.61 


8.05 


8.42 


S2 


Ma 


14.77 


14.96 


10.52 


10.33 


9.51 


10.11 


12.01 


11.31 


10.56 


10.87 


9.79 


10.03 


9.12 


9.21 




Ms 


3.41 


3.99 


2.16 


2.81 


2.01 


2.38 


5.01 


4.67 


4.51 


4.01 


2.11 


2.42 


2.11 


2.19 




Ml 


15.62 


19.11 


13.81 


13.93 


12.19 


12.82 


14.21 


13.56 


14.11 


13.52 


9.41 


10.11 


8.91 


9.43 


S3 


M2 


16.75 


20.96 


14.51 


15.11 


13.16 


13.21 


15.32 


13.89 


14.54 


15.02 


10.09 


11.31 


9.52 


9.81 




Ma 


6.11 


7.97 


4.16 


3.81 


3.14 


3.28 


4.71 


6.11 


4.65 


3.98 


2.51 


2.28 


2.19 


2.25 




Ml 


18.21 


18.41 


17.13 


18.95 


16.09 


14.81 


14.65 


14.67 


13.67 


14.01 


9.11 


10.41 


8.21 


8.72 




M2 


19.58 


19.96 


18.21 


19.15 


17.66 


15.11 


15.55 


15.51 


14.32 


14.33 


10.01 


11.81 


9.33 


9.29 




Ms 


7.91 


8.12 


5.61 


5.18 


5.04 


4.33 


4.56 


5.21 


4.19 


4.97 


2.11 


2.18 


2.10 


2.17 




Ml 


14.89 


16.21 


12.71 


13.10 


11.65 


11.51 


12.87 


12.73 


12.50 


12.24 


9.23 


9.91 


8.42 


8.79 




M2 


16.10 


17.67 


13.61 


13.68 


12.46 


12.13 


13.97 


13.44 


12.86 


13.13 


9.84 


10.79 


9.25 


9.36 




Ms 


5.16 


6.00 


3.71 


3.70 


3.17 


3.28 


4.57 


4.99 


4.25 


4.13 


2.23 


2.35 


...2.15 


27,5 



Sii Scenario i = {1, 2, 3, 4}; Mi: MAE [cm]; M2: RMSE [cm]; Ms: as [cm]; G: Gradient features extraction; EKF: Extended Kalman 



Filter; Q: RANSAC quadratic curve fitting; C: RANSAC cubic curve fitting. 



Table 4. Lane departure warning results. 



Method 


Metric [%] 


Scenario 1 


Scenario 2 


Scenario 3 


Mean 




FP 


8.12 


11.81 


12.41 


10.78 


Gradient 














FN 


12.43 


13.44 


9.92 


11.93 




FP 


5.76 


4.44 


6.11 


5.44 


Gabor Filter Segmentation 


FN 


6.31 


5.65 


5.69 


5.11 




FP 


5.16 


4.09 


5.53 


4.93 


Gabor Filter Segmentation and EKF 


FN 


4.45 


5.71 


5.27 


5.14 


GMRF Segmentation 


FP 


6.44 


7.94 


6.67 


7.02 














FN 


7.12 


4.12 


4.11 


5.12 




FP 


5.23 


6.23 


6.01 


5.82 


GMRF Segmentation and EKF 


FN 


6.51 


3.25 


3.01 


4.26 




FP 


4.01 


5.32 


5.13 


4.82 


Mean-shift Clustering 














FN 


4.21 


3.31 


4.84 


4.12 




FP 


3.71 


4.88 


4.91 


4.50 


Mean-shift Clustering and EKF 








FN 


4.01 


3.21 


4.39 


3.87 



The difference between the RMSE obtained using the quadratic approximation to the road model 
and that obtained with the cubic approximation is on average 0.73 ± 0.92 cm. In some scenarios and 
for some methods, the quadratic model is slightly better than the cubic model, while in other scenarios 
and with different segmentation approaches, the results show that the cubic model is better by a slim 
margin. Similarly, the standard deviations of the errors differ by considerably small amounts. Therefore, 



Sensors 2013, 13 



3291 



for practical application the simpler quadratic model may be enough, especially considering that the 
purpose of the lane departure warning system is to determine whether the vehicle is at risk of leaving 
the lane given its current speed and the road's curvature. Using a quadratic road model, i.e., 0:3 = 0 
in Equation (10), the maximum (worst case) curvature is precisely given by the second order derivative 
of the quadratic polynomial, i.e., the maximum curvature is 2a2. This maximum curvature provides a 
measure of the amount of steering that the driver should be applying to correctly handle the curve. 

The last column of Table 4 summarizes the reliability of the lane departure warning system in terms 
of FP and FN rates. The different approaches are evaluated using a threshold of 5 = 1.0 s for the 
minimum allowable time to lane crossing (TLC) before triggering an alarm of unintended lane departure. 
In comparison with the gradient-based benchmark method, a decrease in the FP and FN rates of 49.5% 
and 52.3% is achieved on average using Gabor filters, with an additional decrease of 4.7% and 4.6% on 
both metrics when also the EKF is used. The GMRF-based segmentation yields a decrease in the rate 
of FP and FN of 34.8% and 57.1% on average, and an additional 11.1% and 7.2% when also the EKF is 
employed. Finally, when comparing the mean-shift clustering approach to the gradient-based benchmark 
method, a larger decrease in the FP and FN rates of 55.3% and 65.5% is possible, with additional relative 
reduction of 2.9% and 2.1% with the EKF. The lane departure warning estimation has no significant 
impact on the processing times, since it relies on the results of the lane detection and tracking step to 
compute simple mathematical operations involved in the TLC measure. 

4.3. Computational Cost 

The theoretical computational complexity for each of the road segmentation approaches is 
summarized in Table 5 (fourth column). The computational complexity is a function of (i) the size of the 
input and parameters that predefine certain volume of computations; and (ii) the type of computational 
process or mathematical operation involved. The input data and parameter characteristics of each 
road segmentation strategy are shown in the second column of Table 5, while the main computational 
process behind each approach is described in the third column. In the case of the gradient-based 
and Gabor-based approaches, the main operation that contributes to the algorithms' complexity is the 
convolution of the image with each filter response. It is possible to show (see for example [59]) that 
the standard implementation of the convolution of a filter mask of size P x Q with an image of size 
M X is 0{PQMN), and therefore, the main difference in computational complexity between the 
gradient and the Gabor approach is the number of times D and B that the convolution must be repeated, 
respectively. On the other hand, the computational complexity of the GMRF approach arises from the 
solution of a linear least squares problem involving matrix inversion and multiplication steps. Thus, 
while the convolution of two sets with n data points has a complexity 0{n^), the matrix inversion of 
an n-dimensional matrix has a complexity 0{n^) using standard Gauss-Jordan elimination. Due to 
the connection between matrix multiplication and matrix inversion, the number of operations for the 
solution to the least squares problem could be reduced with Strassen or Coppersmith-Winograd matrix 
multiplication algorithms [60], but with substantial bookkeeping overhead that makes these algorithms 
not practical unless n >> 100 [61], and thus they are not implemented here. Since the GMRF approach 
takes the expected value of the solution to a linear least squares problem in a neighborhood of size n, the 
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actual cost of GMRF becomes 0{n^). If the size of the GMRF neighborhood is P x P and is assumed 
symmetric, then n = so the cost at each location becomes at least of the order 0{P^ /16). Finally, 

the complexity of the mean-shift approach stems from the neighborhood query for each of the n points in 
the dataset, and therefore, if the points belong to a d-dimensional dataset and iterations are performed, 
the complexity of the naive method is 0{n^dN) [62]. 

Table 5. Computational complexity of the different approaches. 



Approach 



Input 



Complexity Sources 



Computational Complexity 



Gradient 



Gabor 
segmentation 



GMRF 
segmentation 



Mean-sliift 
clustering 



Grayscale image 
of M X N pixels 
P X Q convolu- 
tion mask 
Gradient Orienta- 
tions D 

Grayscale image 
oiM X N pixels 
P X Q convolu- 
tion mask 
A wavelengths, 8 
orientations, and 
4> phases 

Grayscale image 
of M X N pixels 
Px P symmetric 
neighborhood 
with R = 
p2/2 + 1 
parameters 



Color image of 
M X N pixels 
and C channels 
Maximum mean 
shift iterations K 



Convolution of a bank of 
D oriented Laplacian of 
Gaussian filter responses 
of size P X Q with the 
image. 



Convolution of a bank of 
B Gabor filter responses 
of size P X Q with the 
image, with B = \ ■ 8 ■ ij>. 



Linear least squares esti- 
mation involving a R X 
R matrix inversion with 
complexity 0(-R'') and a 
matrix-vector multiplica- 
tion (R X R.) ■ R with 
complexity repeated 
R times on each pixel 
neighborhood and, then re- 
peated MN times for each 
image pixel, i.e., 0{R ■ 
{2i?3)) for each pixel. 
In the naive 

implementation, query 
all the points in the 
dataset (neighborhood) 
around the current point 
to check if the kernel 
of each point in the 
dataset covers the current 
point. If a tessellation 
strategy is implemented, 
an improvement factor a, 
with 0.05 < o < 0.1, is 
possible. 



0{PQMND) 



0{PQMNB) 



0{{P^/16)MN) 



0(a-{MNf{C + 2)K) 



The comparison of the computational complexities at the practical level is not simple because there 
are many implementation aspects that are specific to each approach, for example, the computation 
neighborhood (subwindow block) centered at each pixel must be of a different size in each approach 
in order to achieve the best segmentation. On the other hand, while Gabor and GMRF require the 
training of a classifier (which is done off-line only once), the approaches based on the gradient and 
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mean-shift computation do not require the construction of a classifier. Hence, there are other factors, 
such as the choice of classifier, that can affect the computational cost of the approach in ways that are far 
from subtle. However, a gross estimate of the relative computational complexity can be obtained with 
the simplifying assumption that the computation neighborhood of size P x Q centered at each pixel of 
interest is (i) square and (ii) proportional to the image size by a factor /3, i.e., P = Q = (3M = (3N. 
With this assumption, the computational complexity for each of the four methods in Table 5 will be 
given respectively by 0{(5'^M^D), 0{l3^M'^B), 0{l3^M^°), 0{a^^M\C + 2)K) < 0{a(5^M^), for 
the gradient, Gabor, GMRF, and mean-shift approaches. If additionally it is assumed that (3 = 10^^, 
M = 10^, then the order of floating point operations will be respectively, lO^^D, lO^^B, 10^^, 
and 10^. For D = 3 (three directions) and B = 16 (a Gabor filter bank of dimension 16), the 
Gabor-based approach should take about 5 times more to compute than the gradient-based approach, 
the GMRF- approach would be in theory at least 10^° times slower than the gradient-based approach, 
while the mean-shift approach would be similar or faster to compute than the gradient-based method. 
In practice, for our implementation on a computer with a 2 GHz Intel® Core™ 2 Duo CPU, it was 
observed that the road segmentation with the Gabor filter takes 4 times longer to compute than the basic 
gradient-based method, as shown by the results summarized in Table 6. This result is in close agreement 
with the theoretical estimates. However, contrary to the theoretical predictions, the mean-shift method 
was 2 times slower than the gradient-based approach, but as it yields processing rates of 5.6 fps on 
average on a standard CPU, it should be suitable for real-time computation on dedicated hardware. 
On the other hand, the GMRF-based approach implemented with an eleven-dimensional feature vector 
was 300 times slower than the basic gradient-based approach. This is significantly better than the cost 
that can be predicted from the theoretical complexity estimates, but due to the large processing time of 
26.7 seconds per frame that it required in the experiments, the GMRF-based approach seems unfit for 
practical application to driver assistance using the currently available embedded-computing technology 
for vehicles. 



Table 6. Comparison of the computational time of lane detection and tracking. 



Approach 


Average Processing Time Per Frame (s) 


Frames Per Second 


Gradient 


0.09 


11.1 


Mean-shift 


0.18 


5.6 


Gabor filters 


0.34 


2.9 


GMRF 


26.7 


0.04 



The computational complexities presented in Table 5 do not include the cost of the MSAC variant 
of the RANSAC method for parameter estimation nor the cost associated to the extended Kalman filter, 
because the additional cost of these stages is the same regardless of the choice of pavement segmentation 
strategy. Moreover, on average these steps add less than 10 ms to the computation of each frame. 
However, the cost of the MSAC and extended Kalman filter is included in the experimental measurements 



Sensors 2013, 13 



3294 



of the computational cost presented in Table 6. For a discussion on the theoretical complexity of the 
RANSAC approach and the extended Kalman filter, the reader is referred to [63-66]. 

5. Conclusions 

This work proposed and assessed the effectiveness of using texture based segmentation or 
color-clustering methods for robust lane detection and tracking. The results showed that the road 
can be extracted robustly using the mean-shift method for clustering regions of similar color despite 
shadows and other variability in road materials, structure and illumination (cf. results highlighted 
in Table 3). Texture features also improved the effectiveness of the classic methods based on 
edge-detection. However, obtaining textures is computationally demanding when compared with the 
mean-shift color clustering, and therefore textures are less adequate for applications that require high 
frame processing rates. 

This work also proposed an approach to robustly determine the road geometry despite the presence 
of occlusions and shadows, which typically introduce discontinuities in the extracted lane boundaries. 
To achieve this improvement the standard clothoid road model is approximated by its power series 
expansion of order three. Obtaining the parameters of the cubic polynomials corresponding to the 
correct lane geometry can be carried out successfully despite outliers generated by such disturbances as 
shadows and occlusions using the MSAC variant of the RANSAC robust parameter estimation approach. 
The experiments showed that a quadratic model is enough for practical purposes, and that the cubic 
polynomial model does not yield significantly better results. 

The results confirmed that further improvements to lane sensing and departure warning are possible 
using an Extended Kalman filter to predict the lane location on subsequent frames whenever there are 
severe occlusions. It is to be noted that the proposed approach performs the lane tracking in the 3D 
space of motion using a single camera and not in the 2D image space as most of the existing lane tracking 
approaches do. This has the advantage of enabling a more robust estimation of the road geometry and the 
car's position relative to the lane. Tracking points in 3D space requires the inverse perspective mapping 
of points in the 2D image space back to the 3D space, which is shown to be possible thanks to the 
knowledge of the height and tilt angle at which the camera is mounted on the roof of the vehicle; however, 
in general the inverse perspective projection would require a multi-view approach using more than one 
camera. Ongoing research is concerned with the application of the proposed lane sensing approach, and 
in particular the analysis of lane outliers, to the challenging problem of intersection recognition. Also 
the ongoing research considers the integration of LADAR measurements to obtain an accurate 3D model 
of the road geodesic and torsional curvature using a standard Darboux frame, thus to overcome some 
limitations imposed by the locally flat road model in the back projection of image points to the 3D space 
for points far in the horizon. 
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